All Questions
Tagged with scikit-learnclass-imbalance
55 questions
2votes
1answer
69views
Taking into account instance cost in learning?
I am generally trying to take into account costs in learning. The set-up is as follows: a statistical learning problem with usuall X and y, where y is imbalanced (roughly 1% of ones). Scikit learn ...
0votes
1answer
136views
Imbalanced Cost-Sensitive Learning Workflow - How to split the data, tune hyperparameters and apply adecision threshold?
I am facing a problem with imbalanced dataset in which I would like to detect the rare event. My questions are more of general strategy about the whole workflow and I would like to hear your thoughts ...
-1votes
1answer
61views
How to deal with a heavily imbalanced test dataset?
Both my train data and test data were imbalanced. So I tried SMOTE for training. Before Smote: ...
4votes
2answers
2kviews
Flipping the labels in a binary classification gives different model and results
I have an imbalanced dataset and I want to train a binary classifier to model the dataset. Here was my approach which resulted into (relatively) acceptable performance: 1- I made a random split to get ...
1vote
0answers
1kviews
Downsampling in sklearn. Test and Train performance question
I have a class imbalanced data set, and have the following set up to handle class imbalance. I first split to test and train and only perform downsampling on the training set and then get the test ...
1vote
2answers
641views
Evaluation Metric for Imbalanced and Ordinal Classification
I'm looking for an ML evaluation metric that would work well with imbalanced and ordinal multiclass datasets: Imagine you want to predict the severity of a disease that has 4 grades of severity where ...
2votes
1answer
2kviews
Imbalanced data set with Sample weighting - How to interpret the performance metrics?
Consider a binary classification scenario whereby the True class (5%) is severely outbalanced to the False class (95%). My data set contains numeric data. I am using SKLearn and trying some different ...
0votes
1answer
1kviews
roc_auc_score from sk-learn gives error when test label vector with classes has only a subset of the whole set
I have an imbalanced dataset. Does it make sense to compute the roc-auc for the classifier I created in a holdout set? Here's very artificial MWE: ...
1vote
1answer
251views
Imbalanced classification task – Discrepancy between learning curves and test set evaluation
I have a binary classification task related to customer churn for a bank. The dataset contains 10,000 instances and 11 features. The target variable is imbalanced (80% remained as customers (0), 20% ...
2votes
1answer
367views
Training is not stable with extreme class imbalance
I'm dealing with a multi-class classification problem with around 30 categories. This problem has a severe class imbalance: Around 300 examples for the least common class. Around 100k examples for ...
0votes
1answer
707views
Logistic regression with unbalanced data, scoring based only on rare class
I have a dataset off app. 600.000 data points in which 0.2% (1.200 samples) is labelled as signifying a rare event. I want to use logistic regression to help me predict this rare event, but even when ...
0votes
1answer
28views
Unbalanced training set from balanced data
I am looking to get an unbalanced training set with a given ratio of classA:classB from a dataset without regarding if it is balanced or not. The point is to analyze the influence of data imbalance on ...
0votes
1answer
2kviews
How does class_weight work in Decision Tree?
I am interested in Cost-Sensitive learning. And I am trying to understand how class_weight in DecisionTree works in terms of math. I read a lot of articles that ...
0votes
2answers
974views
GridSearch on imbalanced datasets
Im trying to use gridsearch to find the best parameter for my model. Knowing that I have to implement nearmiss undersampling method while doing cross validation, should I fit my gridsearch on my ...